The PowerPC
Volume Number: 10
Issue Number: 2
Column Tag: Powering Up
From CISC to RISC
By Richard Clark & Jordan Mattson, Apple Computer, Inc.
The Heart of the Next Generation
The forthcoming generation of Macintosh systems will be powered by the
PowerPC family of RISC microprocessors. Apple’s decision to make this change wasn’t
undertaken lightly. This month’s Powering Up will examine the differences between
CISC and RISC, take a look at the PowerPC family of microprocessors, and close with a
overview of the architecture of the first PowerPC implementation - the PowerPC
“601”. This should help explain why Apple is making such a dramatic change to the
Macintosh product line.
A brief history of the (CISC) universe
The earliest microcomputers were designed to be easy to program in assembly
language and were designed to conserve memory, which was expensive and slow to
access. (They were also designed according to the limited manufacturing techniques
available.) This led to chips that had:
• Very few registers - often only an “accumulator” and one or two
general-purpose registers
• “Complex” instructions that allowed assembly language programmers to write
programs using a small number of these instructions instead of a large number of
simpler instructions (this also conserved memory)
• “Variable length” instructions where the instruction (often 1 byte long) would
be followed by the information needed by that instruction
• Multiple styles of accessing memory, known as “addressing modes,” which
allowed programmers to access individual locations directly, via a pointer, by an
offset to a pointer, by combining pointers, and so on
These processors also executed instructions serially - each instruction had to
complete before the next instruction could begin.
As microprocessors evolved, from 8 bits to 16 bits to 32 bits, each new
generation added more registers, more addressing modes, and new instructions; some
chips even added a limited form of Pipelining - the ability to execute multiple
instructions at once. But, the basic design was still oriented towards conserving
memory and serving the needs of the assembly-language programmer, often at the
expense of speed.
Enter RISC
In the early 1980s, several designers noticed that microprocessor design hadn’t
kept up with the rest of the system. Memory was faster and much less expensive,
assembly-languages had been replaced largely by such “high-level languages” as C and
Pascal, and existing designs were pushing the limits of what could be manufactured. So
they went back to the drawing boards, and came out with simpler designs that were
optimized for speed and for use with high-level languages. These new designs used
instruction sets made up of many simple instructions, and thus were dubbed “reduced”
instruction set computers.
While the exact meaning of “RISC” is still a subject for debate, most RISC
designs include:
• A large number of general purpose registers, and few special-purpose registers
• Instruction sets which are well matched to the needs of compilers, and which
contain many “simple” instructions
• Instructions which fit completely in a single “word” (including the data used by
the instruction), and which are encoded in an easy to process format.
• A “load/store” architecture, where information has to be loaded into registers
before it can be used
• A small number of memory addressing modes, often only one or two, which use a
pointer in one of the registers
These features allow most RISC implementations to apply a few simple techniques
to get maximum performance:
• Pipelining, so the processor can process multiple instructions simultaneously
• Memory caches, which provide faster access to instructions and data than system
RAM or ROM
• Restrictions on data alignment, where the processor requires that all two-byte
values be aligned on an even address, all four-byte values be aligned on an even
multiple of four, and so on.
The PowerPC is a RISC design which has all of these “common” RISC features,
except that it relaxes the rules for memory alignment.
The PowerPC - An Overview
The PowerPC architecture is a collaborative effort of Apple, IBM, and Motorola to
create a new generation of high performance microprocessors which can used in
everything from personal computers, workstations, servers, and multiprocessor
systems to embedded microcontrollers.
The PowerPC is based on IBM’s highly successful POWER architecture. The
POWER architecture was designed for scientific workstations, and has been optimized
for both integer and floating-point math operations. The POWER architecture also
incorporates a “branch processor” which attempts to minimize the impact of branch
instructions on the processor’s performance.
When the Apple-IBM-Motorola consortium set out to design PowerPC, the
members modified the POWER architecture to reduce manufacturing costs and make the
design more suitable for desktop computers. They eliminated parts of the POWER
instruction set that made the POWER architecture more difficult to implement but had
a minimal impact on performance. While the architects were modifying the
instructions set for the architecture, they also removed dependencies between
instructions, and added features which simplified building multi-processor systems.
The result of the these changes is a low-cost, high-performance RISC
architecture with:
• Fixed length, consistently encoded instructions
• A register-to-register (load/store) architecture, with support for aligned data
accesses, misaligned data accesses, and both big-endian and little-endian data
• A “simple” instruction set, with instructions which may be tailored to the task
at hand (for example, setting the condition codes at the end of an arithmetic
operation is an option, not a requirement)
• Simple, yet powerful, addressing modes applied consistently across the
instruction set
• A large register set which includes both general-purpose and floating-point
registers
• Floating-point as a first-class data type. This means that floating-point is a
standard part of the architecture and therefore is better integrated than it is in
many other RISC architecturers
Some of these features - notably the mis-aligned data support and the dual
big-endian / little-endian support - are unusual in a RISC design, but were required
to support past and future Macintosh designs.
The PowerPC Family of Microprocessors
The PowerPC family currently has the following four members:
601 - The 601 is a fusion of the POWER architecture and the PowerPC
architecture. It is designed to drive mainstream desktop systems. A Macintosh with a
601 will deliver integer performance three to five times that of today’s high-end
68040-based Macintosh systems and floating point performance around ten times that
of today’s high-end 68040-based Macintosh systems.
603 - The 603 is the first PowerPC only implementation of the PowerPC
architecture. It is designed for low-cost and low-power consumption. The 603 will be
used in portable and low-cost desktop Macintosh with PowerPC systems. In many
ways, over time the 603 could become Apple’s replacement for the 68030.
604 - The 604 is designed for mainstream desktop personal computers. It will
cost about as much as the 601, but will deliver higher performance.
620 - The 620, which is currently still in the design phase, is a
high-performance microprocessor that Motorola and IBM believes will be well-suited
for very high-end personal computers, workstations, servers, and multiprocessor
systems.
The PowerPC 601 in Contextand why Apple likes RISC
Many developers and customers have been asking how the 601 stacks up against
Intel’s state-of-the-art CISC design, the “Pentium.” On a basis of price,
performance, and power consumption, the PowerPC 601 compares quite favorably. As
you can see from Table 1, the 601 delivers integer performance that matches and
floating-point performance that exceeds Pentium’s for about half the cost. In addition
it consumes about half the power of Pentium.
Pentium PowerPC 601
Frequency 66 MHz 66 MHz
Die Size 264 mm2 120 mm2
Cache 16K 32K
Power 14 Watts 9 Watts
SPECInt92 64 60
SPECfp92 57 80
Price $950.00 $450.00
This comparison should give you some idea why Apple is staking such a large part
of its future on RISC. The PowerPC 601 is the first of its generation (though it does
descend from previous RISC architectures), yet matches the performance of the latest
CISC chips - and the next PowerPC implementation (603) is well under way. While
CISC designers have to work increasingly hard to squeeze more performance out of
their designs, at an ever increasing manufacturing cost, RISC designs have
considerable room for growth. The evolution of RISC designs has the potential to
outstrip the evolution of CISC.
A Quick Tour of the 601
Every PowerPC design begins with the fundamental architecture shown in Figure
1, with some chip-specific details. For example, the 601 incorporates single 32K
cache which holds both instructions and data, while other PowerPC models are likely to
separate the two caches as shown. Also, future implementations may include multiple
arithmetic logic units in both the fixed-point and floating-point units, allowing
multiple arithmetic operations to proceed simultaneously.
Figure 1 - A General Diagram of the PowerPC Architecture
Each of these units has a specific purpose:
• The Branch Unit collects instructions from the Instruction Queue, then locates
and removes any branches from the instruction stream before sending
instructions to the Fixed-Point and Floating-Point units. Unconditional branches
can be removed from the instruction stream, while conditional branches (i.e.
part of an “if” statement or a loop) might require the branch unit to “predict”
the outcome of the branch. In any case, the branch unit tries to provide an
uninterrupted stream of instructions to the units downstream.
• The Fixed-Point unit holds the 32 General-Purpose registers (including one
which is used as the Stack Pointer, and another which resembles register A5 in a
68K-based Macintosh.) Each register is one “word” wide, where a word is 32
bits on a 32-bit PowerPC (601/603/604) and 64 bits on the 620
implementation.
The fixed-point unit also holds the Fixed-Point arithmetic unit. This unit
implements the standard addition, subtraction, multiplication, and division
operations, as well as some comparison, logical, and shift/rotate instructions.
On the 601, the Fixed-Point unit also manages the transfers of data
between memory (the Data cache) and the internal registers. This function may
be implemented in a separate functional unit on future PowerPC
implementations. (Note that even though the Fixed-Point unit manages load and
store operations, data cannot be transferred directly between the Fixed-Point and
Floating-Point units - the transfer must go through memory.)
The Fixed-Point unit also serves to calculate addresses for use by the
Branch Unit.
• The Floating-Point unit holds the 32 floating-point registers and the
Floating-Point arithmetic unit. Each register is 64 bits wide (a “double
precision” floating-point value), but can hold single-precision (32-bit) values
as well.
The Floating-Point unit implements addition, multiplication, and
division, combining addition/subtraction and multiplication into a single
“multiply and accumulate” unit. This design fits well with most scientific
computing needs, where a common operation involves multiplying two values and
then adding the result to a running total.
Since the processor contains multiple functional units, each one of which can
execute an instruction independently of the others, this is a variety of
“multiple-issue” design (where multiple instructions may be executed in a single
clock.) Under ideal conditions, the 601 can execute 3 instructions in a single clock - a
branch instruction, a floating-point instruction, and a fixed-point instruction.
Optimizing Code for the PowerPC
One of the ways that a programmer can take advantage of the design of the
PowerPC is by instruction scheduling - arranging instructions so that each functional
unit can run without stopping to wait for information or another unit. The PowerPC
compilers are designed to use instruction scheduling to create the smallest, quickest
applications possible.
For example, a compiler has to implement an “if” statement using at least two
operations - performing a test (which sets the appropriate condition codes) followed
by a “conditional branch” instruction. Whenever possible, the compiler will schedule
some operations to occur between the test and the branch instruction, which gives the
branch unit time to forsee the branch, access the condition codes, and predict the
outcome of the branch perfectly.
Another example involves loading registers well before they are actually needed,
which gives the load operation time to complete (which may require several clock
cycles if it has to go to RAM.)
A final example involves allocating “scratch” registers within a function. The
Runtime Architecture designates several registers as “volatile”, i.e. not saved across
function calls. The compiler can look at a group of functions which are compiled
together, and locate which volatile registers are not changed across calls to a
particular function, and thereby use that as a scratch register in the calling function.
All of these optimizations require an in-depth knowledge of the processor, and
the ability to see the entire structure of a single compiled file. The compiler writers
are able to build the myriad rules for instruction scheduling right into the compiler,
and the compiler can keep track of the code it generates. Because of this, the compiler
often generates better code than an assembly-language programmer will. In fact, Apple
suggests that programmers move their entire program into a high-level language
(probably portable ANSI C or C++) and only move to assembler those parts which
absolutely cannot be expressed in a high-level language.
Next Month in Powering Up
The second most frequently asked questions about Macintosh with PowerPC -
after, “When can I buy one?” - are “How can I program one” and “What is the
average user going to do with that much power?” In next month’s column, we’ll take a
look at the development tools for PowerPC and some applications which show off the
PowerPC performance to good advantage.
Further reading
Space limitations have forced us to give you just the briefest of sketches on the
evolution of RISC and the design features of the PowerPC 601. For more information on
either of these topics, please consult the PowerPC RISC Microprocessor User’s
Manual, published by Motorola (part number MPC601UM/AD) which is included in
the Macintosh with PowerPC Starter Kit available from APDA, and the Programmer’s
Introduction to RISC and PowerPC CD-ROM available from APDA.